MedFusionGAN: multimodal medical image fusion using an unsupervised deep generative adversarial network

Purpose This study proposed an end-to-end unsupervised medical fusion generative adversarial network, MedFusionGAN, to fuse computed tomography (CT) and high-resolution isotropic 3D T1-Gd Magnetic resonance imaging (MRI) image sequences to generate an image with CT bone structure and MRI soft tissue contrast to improve target delineation and to reduce the radiotherapy planning time. Methods We used a publicly available multicenter medical dataset (GLIS-RT, 230 patients) from the Cancer Imaging Archive. To improve the models generalization, we consider different imaging protocols and patients with various brain tumor types, including metastases. The proposed MedFusionGAN consisted of one generator network and one discriminator network trained in an adversarial scenario. Content, style, and L1 losses were used for training the generator to preserve the texture and structure information of the MRI and CT images. Results The MedFusionGAN successfully generates fused images with MRI soft-tissue and CT bone contrast. The results of the MedFusionGAN were quantitatively and qualitatively compared with seven traditional and eight deep learning (DL) state-of-the-art methods. Qualitatively, our method fused the source images with the highest spatial resolution without adding the image artifacts. We reported nine quantitative metrics to quantify the preservation of structural similarity, contrast, distortion level, and image edges in fused images. Our method outperformed both traditional and DL methods on six out of nine metrics. And it got the second performance rank for three and two quantitative metrics when compared with traditional and DL methods, respectively. To compare soft-tissue contrast, intensity profile along tumor and tumor contours of the fusion methods were evaluated. MedFusionGAN provides a more consistent, better intensity profile, and a better segmentation performance. Conclusions The proposed end-to-end unsupervised method successfully fused MRI and CT images. The fused image could improve targets and OARs delineation, which is an important aspect of radiotherapy treatment planning. Supplementary Information The online version contains supplementary material available at 10.1186/s12880-023-01160-w.


I. Quantitative metrics
The brief definition of the metrics are as follows: 1. Entropy (H): This metric measured in bits provides the amount of information required to code a given image.The higher the image entropy is, the better the image fusion result is.It is defined as follows: where p(x) is probability image intensity x for a give image X.
2. Standard deviation (SD): SD given in Equation 2 is a statistical measure that provides information about image contrast.For a noiseless image, an higher SD means a better image contrast.
3. Mean gradient (MG): MG quantifies an image's gradient information, and higher MG means more edge information.MG is mathematically defined as follows: Where D i is a differentiation along a given dimension i, and Ω is the image size.
4. spatial frequency (SF): SF measures an image gradient (image edge) and texture.It is calculated using row frequency (RF) and column frequency (CF) as follows: where 2 .An image with a higher SF provides more image edge information, and human eyes are more sensitive to images with higher SF.

Mutual information (MI):
For a given two images X and F with marginal probability distributions p X (x) and p F (f ) and joint probability p X,F (x, f ).X and F are statistically independent if p X,F (x, f ) = p X (x).p F (f ).The mutual information I(X, F ) of X and F quantify the distance between the joint distribution p X,F (x, f ) and the distribution when two distributions are statistically independent p X (x).p F (f ) using the Kullback-Leibler measure, i.e., It is related to the entropy of the images as follows: where H(•), H(•, •), and H(•|•) are the entropy (given in Equation 1), joint entropy (= − ∑ x,f p X,F (x, f ) log p X,F (x, f )), and conditional entropy (= − ∑ x,f p X,F (x, f ) log p X|F =f (x)), respectively.6. Normalized cross correlation (NCC): NCC measures linear similarity between two images using the direct use of image signal intensities.It is calculated as follows: where Z ∈ {X, Y }.It ranges from 0 to 1 and greater value means a higher similarity between source images and fused image.
7. Peak signal-to-noise ratio (PSNR): PSNR calculated in decibel measure image representation fidelity.It is a ratio between maximum signal intensity power and the noise distortion power, i.e., where s is the maximum signal intensity.M SE given in Equation 9is the mean squared error that measure the image similarity.
8. Q XY/F : Q XY /F quantifies the amount of the edge information of the fused image.
Considering the fusing source images X and Y and fused image F, edge (g(i, j)) and orientation (α(i, j)) for each pixel is calculated as follows: where Z ∈ X, Y and D i Z is the differential of Z along dimension i.
The relative edge strength (G ZF ) and orientation (A ZF ) of an input image Z (∈ {X, Y }) with respect to F are as follows: Then, we can estimate the perceptual information loss in fused image for an input image Z (∈ {X, Y }) as follows: where constants Γ g , k g , σ g , Γ α , k α , and σ α determine the exact shape of the sigmoid functions that form the edge strength and orientation values.Equation 13 calculates the edge information preservation.
where Q ZF = 1 means lossless image fusion.
Finally, we obtain the fusion edge preservation from the weighted sum of the edge information, i.e., where the higher value means a better image fusion with lower information loss.9. Structural similarity index (SSIM): SSIM assess the structural similarity of two images.It consists of three pixelwise comparisons including luminance, contrast, and structure.SSIM for two images Z and F is defined as follows: where µ z , σ F , and σ ZF are the local mean, local standard deviations, and local covariance between images Z and F , respectively.C 1 and C 2 are the constant parameters to stable the SSIM.The SSIM between source images and fused image is a weighted sum of the Equation 15and is defined as: